-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ByteString implementation #148
Conversation
eafb2dc
to
6a535f8
Compare
08c8f2f
to
77f27a8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor questions, mainly regarding API parity with stdlib String
/ByteArray
4684c07
to
43a9e76
Compare
9b44bd9
to
30e76b0
Compare
bytestring/common/src/ByteString.kt
Outdated
* @param string the string to be encoded. | ||
*/ | ||
public fun ByteString.Companion.fromUtf8String(string: String): ByteString { | ||
return wrap(string.encodeToByteArray()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
encodeToByteArray
behaves differently on JVM and Native when encoding invalid code points.
It may be fine, but primitives from the code module behave similarly on all targets (and the behavior is the same as encodeToByteArray
's on JVM).
So we have to either hoist UTF-8 support from the code module into a separate module and reuse it here or change the behavior in the core module to match stdlib's behavior.
Either way, I'd rather postpone a fix and discuss it separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, does it make sense to create an YouTrack issue on Kotlin side mentioning this inconsistency? Specifically in scope of ongoing efforts on stabilising K/N runtime/stdlib APIs and behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: K/JS behaviour of encodeToByteArray
is the same as in K/N (not sure about K/WASM). In that case only K/JVM behaviour is differs from other platforms.
And now in kotlinx-io
we adopting K/JVM behaviour.
While it could be fine, it's a little inconsistent
30e76b0
to
7702cbd
Compare
} | ||
|
||
/** | ||
* Compares a byte sequence wrapped by this byte string to a byte sequence wrapped by [other] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is worth adding either a more detailed description or examples of exactly how the comparison works.
If I am not mistaken, the lexicographic order does not strictly define the relation of values such as (1, 2, 3) ? (1, 3)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll mention that the behavior is similar to String::compareTo
and then add a sample comparing different byte strings in a separate PR (for #134)
7702cbd
to
5e0fb14
Compare
|
||
import kotlinx.io.bytestring.ByteString | ||
import kotlinx.io.bytestring.buildByteString | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed it during the original review.
We typically prefer extensions for class Foo
either placed in the same file (-> classFooKt
) for small and compact additions, or in plural form of the entity -- Foos
(Channels.kt
, Strings.kt
, Serializers.kt
). Bit nicer stacktraces, a bit more consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also use @file:JvmMultifileClass
for files spread across platforms, but maybe it's a bit too redundant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing it!
|
||
perPackageOption { | ||
suppress.set(true) | ||
matchingRegex.set(".*unsafe.*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's worth excluding it from the official documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, it looks like a Forbidden fruit - we're asking not to use it by any means and then making it visible to everyone.
Currently, we use ByteString.decodeToString and String.encodeToByteString as names for conversion methods between String and ByteString, where non-parameterized functions convert to/from UTF-8, and JVM-specific extensions accept Charset. At the same time, the core module uses different naming for methods reading/writing UTF-8 string and methods reading/writing strings using specific Charset: - Source.readUtf8/Sink.writeUtf8 to work with UTF-8 strings - Source.readString/Sink.writeString to work with strings using the given Charset on JVM. The naming is inconsistent and it seems reasonable to unify read/write methods naming with what we have for ByteString. We can use readString/writeString w/o charset for UTF-8 strings (as these are treated as default in the library) and same-titled JVM-specific extensions accepting a Charset.
Implemented ByteString - an immutable wrapper around byte sequence and added base support to the core IO library.
Missing stuff:
Closes #133